Goto

Collaborating Authors

 zero-shot performance


Efficient Equivariant Transfer Learning from Pretrained Models

Neural Information Processing Systems

Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimizationbased methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et al. (2022) are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose λ-equitune that averages the features using importance weights, λs. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that λ-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of Kaba et al. (2022) used with appropriate loss functions, which we call equizero, also gives excellent zero-shot and finetuned performance.


Compressing Large Language Models using Low Rank and Low Precision Decomposition

Neural Information Processing Systems

This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$. Here, $\mathbf{L}$ and $\mathbf{R}$ are low rank factors, and the entries of $\mathbf{Q}$, $\mathbf{L}$ and $\mathbf{R}$ are quantized. The model is compressed by substituting each layer with its $\mathbf{Q} + \mathbf{L}\mathbf{R}$ decomposition, and the zero-shot performance of the compressed model is evaluated. Additionally, $\mathbf{L}$ and $\mathbf{R}$ are readily amenable to low-rank adaptation, consequently enhancing the zero-shot performance.


A Appendix A.1 UniBench Implementation Details We have developed UniBench

Neural Information Processing Systems

To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. Following Matsuura et al. [2023] methodology, we evaluated Llava 1.5 [Liu et al., 2023] - a LLM-style VLM - on various benchmark types in UniBench (Table 2). Scaling improves many benchmarks, but offers little benefit for reasoning and relation. Figure 8: Benchmark capabilities performance does not scale with dataset and model size Median zero-shot performance of models on various benchmark capabilities.



Checklist 1. For all authors (a)

Neural Information Processing Systems

Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See A.2 (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) For a detailed description and intended uses, please refer to 1. A.2 Dataset Accessibility We plan to host and maintain this dataset on HuggingFace. A.4 Dataset Examples Example question-answer pairs are provided in Tables 9 10 11, . Example Question "What does the symbol mean in Equation 1?" Answer "The symbol in Equation 1 represents "follows this distribution". "Can you provide more information about what is meant by'generative process in "The generative process refers to Eq. (2), which is a conceptual equation representing Question "How does the DeepMoD method differ from what is written in/after Eq 3?" Answer "We add noise only to Question "How to do the adaptive attack based on Eq.(16)? "By Maximizing the loss in Eq (16) using an iterative method such as PGD on the end-to-end model we attempt to maximize the loss to cause misclassification while Question "How does the proposed method handle the imputed reward?" "The proposed method uses the imputed reward in the second part of Equation 1, "Table 2 is used to provide a comparison of the computational complexity of the "Optimal number of clusters affected by the number of classes or similarity between "The authors have addressed this concern by including a new experiment in Table 4 of Question "Can you clarify the values represented in Table 1?" Answer "The values in Table 1 represent the number of evasions, which shows the attack "The experiments in table 1 do not seem to favor the proposed method much; softmax Can the authors explain why this might be the case?" Answer "The proposed method reduces to empirical risk minimization with a proper loss, and However, the authors hope that addressing concerns about the method's theoretical Question "Does the first row of Table 2 correspond to the offline method?"






LOVM: Language-Only Vision Model Selection

Neural Information Processing Systems

Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few-and zero-shot settings. However, selecting the best-performing VLM for some downstream applications is non-trivial, as it is dataset and task-dependent. Meanwhile, the exhaustive evaluation of all available VLMs on a novel application is not only time and computationally demanding but also necessitates the collection of a labeled dataset for evaluation. As the number of open-source VLM variants increases, there is a need for an efficient model selection strategy that does not require access to a curated evaluation dataset. This paper proposes a novel task and benchmark for efficiently evaluating VLMs' zero-shot performance on downstream applications without access to the downstream task dataset.